Creating and controlling video-realistic talking heads
نویسندگان
چکیده
We present a linear three-dimensional modeling paradigm for lips and face, that captures the audiovisual speech activity of a given speaker by only six parameters. Our articulatory models are constructed from real data (front and profile images), using a linear component analysis of about 200 3D coordinates of fleshpoints on the subject's face and lips. Compared to a raw component analysis, our construction approach leads to somewhat more comparable relations across subjects: by construction, the six parameters have a clear phonetic/articulatory interpretation. We use such a speaker’s specific articulatory model to regularize MPEG-4 facial articulation parameters (FAP) and show that this regularization process can drastically reduce bandwidth, noise and quantization artifacts. We then present how analysis-by-synthesis techniques using the speaker-specific model allows the tracking of facial movements. Finally, the results of this tracking scheme have been used to develop a text-to-audiovisual speech system.
منابع مشابه
Photo-Realistic Talking-Heads from Image Samples
This paper describes a system for creating a photo-realistic model of the human head that can be animated and lip-synched from phonetic transcripts of text. Combined with a state-of-the-art text-to-speech synthesizer (TTS), it generates video animations of talking heads that closely resemble real people. To obtain a naturally looking head, we choose a “data-driven” approach. We record a talking...
متن کاملBuilding speaker-specific lip models for talking heads from 3d face data
When creating realistic talking head animations, accurate modeling of speech articulators is important for speech perceptibility. Previous lip modeling methods such as simple numerical lip modeling focus on creating a general lip model without incorporating lip speaker variations. Here we present a method for creating accurate speaker-specific lip representations that retain the individual char...
متن کاملDialog-driven video-realistic image-based eye animation
Talking-heads are useful to give a face to multimedia applications such as virtual operators or news readers in dialog systems. However, their great commercial potentials can only become true, if talking-heads are indistinguishable from real recorded videos and at the same time correctly model the human-like behavior. For this, mouth as well as nonverbal behaviors such as head movements, facial...
متن کاملSample-Based Synthesis of Photo-Realistic Talking Heads
This paper describes a system that generates photorealistic video animations of talking heads. First the system derives head models from existing video footage using image recognition techniques. It locates, extracts and labels facial parts such as mouth, eyes, and eyebrows into a compact library. Then, using these face models and a text-to-speech synthesizer, it synthesizes new video sequences...
متن کاملQuality of talking heads in different interaction and media contexts
We investigate the impact of three different factors on the quality of talking heads as metaphors of a spoken dialogue system in the smart home domain. The main focus lies on the effect of voice and head characteristics on audio and video quality, as well as overall quality. Furthermore, the influence of interactivity and of media context on user perception is analysed. For this purpose two sub...
متن کامل